Goto

Collaborating Authors

 US Virgin Islands


Jeffrey Epstein's Ties to CBP Agents Sparked a DOJ Probe

WIRED

Documents say customs officers in the US Virgin Islands had friendly relationships with Epstein years after his 2008 conviction, showing how the infamous sex offender tried to cultivate allies. United States prosecutors and federal law enforcement spent over a year examining ties between Jeffrey Epstein and Customs and Border Protection officers stationed in the US Virgin Islands (USVI), according to documents recently released by the Department of Justice. As The Guardian and New York Times have reported, emails, text messages, and investigative records show that Epstein cultivated friendships with several officers, entertaining them on his island and offering to take them for whale-watching trips in his helicopter. He even brought one cannolis for Christmas Eve. In turn, Epstein would bring certain officers his complaints about his treatment at the hands of other CBP and federal agents.



Risk Prediction of Cardiovascular Disease for Diabetic Patients with Machine Learning and Deep Learning Techniques

Chowdhury, Esha

arXiv.org Artificial Intelligence

Accurate prediction of cardiovascular disease (CVD) risk is crucial for healthcare institutions. This study addresses the growing prevalence of diabetes and its strong link to heart disease by proposing an efficient CVD risk prediction model for diabetic patients using machine learning (ML) and hybrid deep learning (DL) approaches. The BRFSS dataset was preprocessed by removing duplicates, handling missing values, identifying categorical and numerical features, and applying Principal Component Analysis (PCA) for feature extraction. Several ML models, including Decision Trees (DT), Random Forest (RF), k-Nearest Neighbors (KNN), Support Vector Machine (SVM), AdaBoost, and XGBoost, were implemented, with XGBoost achieving the highest accuracy of 0.9050. Various DL models, such as Artificial Neural Networks (ANN), Deep Neural Networks (DNN), Recurrent Neural Networks (RNN), Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), and Gated Recurrent Unit (GRU), as well as hybrid models combining CNN with LSTM, BiLSTM, and GRU, were also explored. Some of these models achieved perfect recall (1.00), with the LSTM model achieving the highest accuracy of 0.9050. Our research highlights the effectiveness of ML and DL models in predicting CVD risk among diabetic patients, automating and enhancing clinical decision-making. High accuracy and F1 scores demonstrate these models' potential to improve personalized risk management and preventive strategies.


Evaluation of A Spatial Microsimulation Framework for Small-Area Estimation of Population Health Outcomes Using the Behavioral Risk Factor Surveillance System

Von Hoene, Emma, Gupta, Aanya, Kavak, Hamdi, Roess, Amira, Anderson, Taylor

arXiv.org Artificial Intelligence

The field of population health addresses a wide spectrum of challenges, spanning infectious and chronic diseases to mental health and health risk behaviors such as smoking and alcohol consumption (Sharma et al., 2025). A common barrie r to addressing these issues is the lack of ground truth data capturing health outcomes and behaviors at fine geographic scales. This limits both local and national health decision - makers in planning and management efforts, such as identify ing health inequalities or targeting interventions where they are most needed (Rahman, 2017; Wang, 2018) . T o fill this gap, researchers use small area estimation (SAE), a collection of statistical methods that combine survey and geographic data to generate estimates of population - level health outcomes at various spatial scales (RTI International, 2025) . There are numerous methods for generating SAE of health outcomes, which can generally be grouped into two main approaches: direct and indirect model - based estimates (Rahman, 2017) . Direct estimates are calculated using only the survey responses from individuals or households sampled within the specified geographi c areas (counties, states) to estimate disease prevalence or other population characteristics.



Predicting Microbial Ontology and Pathogen Risk from Environmental Metadata with Large Language Models

Yoo, Hyunwoo, Rosen, Gail L.

arXiv.org Artificial Intelligence

Traditional machine learning models struggle to generalize in microbiome studies where only metadata is available, especially in small-sample settings or across studies with heterogeneous label formats. In this work, we explore the use of large language models (LLMs) to classify microbial samples into ontology categories such as EMPO 3 and related biological labels, as well as to predict pathogen contamination risk, specifically the presence of E. Coli, using environmental metadata alone. We evaluate LLMs such as ChatGPT-4o, Claude 3.7 Sonnet, Grok-3, and LLaMA 4 in zero-shot and few-shot settings, comparing their performance against traditional models like Random Forests across multiple real-world datasets. Our results show that LLMs not only outperform baselines in ontology classification, but also demonstrate strong predictive ability for contamination risk, generalizing across sites and metadata distributions. These findings suggest that LLMs can effectively reason over sparse, heterogeneous biological metadata and offer a promising metadata-only approach for environmental microbiology and biosurveillance applications.


Deep learning four decades of human migration

Gaskin, Thomas, Abel, Guy J.

arXiv.org Artificial Intelligence

W e present a novel and detailed dataset on origin-destination annual migration flows and stocks between 230 countries and regions, spanning the period from 1990 to the present. Our flow estimates are further disaggregated by country of birth, providing a comprehensive picture of migration over the last 35 years. The estimates are obtained by training a deep recurrent neural network to learn flow patterns from 18 covariates for all countries, including geographic, economic, cultural, societal, and political information. The recurrent architecture of the neural network means that the entire past can influence current migration patterns, allowing us to learn long-range temporal correlations. By training an ensemble of neural networks and additionally pushing uncertainty on the covariates through the trained network, we obtain confidence bounds for all our estimates, allowing researchers to pinpoint the geographic regions most in need of additional data collection. W e validate our approach on various test sets of unseen data, demonstrating that it significantly outperforms traditional methods estimating five-year flows while delivering a significant increase in temporal resolution. The model is fully open source: all training data, neural network weights, and training code are made public alongside the migration estimates, providing a valuable resource for future studies of human migration.


Simulating Correlated Electrons with Symmetry-Enforced Normalizing Flows

Schuh, Dominic, Kreit, Janik, Berkowitz, Evan, Funcke, Lena, Luu, Thomas, Nicoli, Kim A., Rodekamp, Marcel

arXiv.org Artificial Intelligence

One of the most widely used theoretical frameworks for studying such systems is the Hubbard model [2-4], which captures the essential competition between electron kinetic energy and on-site interactions. Over the years, a variety of methods have been developed to analyze the Hubbard model. In the weak interaction regime, perturbative approaches provide valuable insights [5]. However, outside this regime, non-perturbative effects become significant, rendering pertur-bative techniques insufficient. In these regimes, Monte Carlo simulations become an indispensable tool (see, for example, [6-13] and references therein).


Revisiting Noise in Natural Language Processing for Computational Social Science

Borenstein, Nadav

arXiv.org Artificial Intelligence

Computational Social Science (CSS) is an emerging field driven by the unprecedented availability of human-generated content for researchers. This field, however, presents a unique set of challenges due to the nature of the theories and datasets it explores, including highly subjective tasks and complex, unstructured textual corpora. Among these challenges, one of the less well-studied topics is the pervasive presence of noise. This thesis aims to address this gap in the literature by presenting a series of interconnected case studies that examine different manifestations of noise in CSS. These include character-level errors following the OCR processing of historical records, archaic language, inconsistencies in annotations for subjective and ambiguous tasks, and even noise and biases introduced by large language models during content generation. This thesis challenges the conventional notion that noise in CSS is inherently harmful or useless. Rather, it argues that certain forms of noise can encode meaningful information that is invaluable for advancing CSS research, such as the unique communication styles of individuals or the culture-dependent nature of datasets and tasks. Further, this thesis highlights the importance of nuance in dealing with noise and the considerations CSS researchers must address when encountering it, demonstrating that different types of noise require distinct strategies.


Simulating the Hubbard Model with Equivariant Normalizing Flows

Schuh, Dominic, Kreit, Janik, Berkowitz, Evan, Funcke, Lena, Luu, Thomas, Nicoli, Kim A., Rodekamp, Marcel

arXiv.org Artificial Intelligence

Generative models, particularly normalizing flows, have shown exceptional performance in learning probability distributions across various domains of physics, including statistical mechanics, collider physics, and lattice field theory. In the context of lattice field theory, normalizing flows have been successfully applied to accurately learn the Boltzmann distribution, enabling a range of tasks such as direct estimation of thermodynamic observables and sampling independent and identically distributed (i.i.d.) configurations. In this work, we present a proof-of-concept demonstration that normalizing flows can be used to learn the Boltzmann distribution for the Hubbard model. This model is widely employed to study the electronic structure of graphene and other carbon nanomaterials. State-of-the-art numerical simulations of the Hubbard model, such as those based on Hybrid Monte Carlo (HMC) methods, often suffer from ergodicity issues, potentially leading to biased estimates of physical observables. Our numerical experiments demonstrate that leveraging i.i.d.\ sampling from the normalizing flow effectively addresses these issues.